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.ABSTRACT 

Three procedures used to control Type I error rate in 
stepwise regression analysis are forward selection^ tackward 
elimination^ and true stepwise. In the fcrwaid selection method^ a 
icdel of the dependent variable is forned by choosing the single best 
predictor; then the second predictor which makes the strongest 
contribution to the prediction of the dependent variable is chosen^ 
controlling for the effects of the first variable. The process 
continues so that the variable chosen increases the prediction 
potential, until remaining variables fail tc make any contribution. 
Backward elimination begins with a model containing all predictors; 
and, at each step, a variable is eliminated if its removal results in 
the smallest reduction of effectiveness. True stepwise procedure is a 
variant of forward selection. To test these procedures, a Monte Carlo 
computer program, written in FORTRAN IV, was prepared. The results 
support two conclusions: (1) the probability of erroneously forming a 
regression model increases as a function of the numter of predictors; 
and (2) as the int e^- predictor correlation increases, the probability 
of making errors decreases. Therefore, the number of predictors and 
the inter-predictor correlation should be considered when attempting 
to solve an error rate problem. (MH) 
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Ccntr^JLling the Type I Error Ratre 
in 2t ^wise Regression .-.naljsas 

Stepwise regr dss±:3i .las become a widely used tech"zi.que foi"* selecting 
a subset of potent™. . xrridictorr for some d^^'sndent vardabJ Three 
procedures have he^ sel under :he rubric stepwise reg: sion 
analysis: Forward - zLor,^ iac^k^iird el^TiTi^ nLoriy and rr^'- fn^pwise 
(Draper and Smiim, l^^z 

The forwar ricr. prcceJU:^- forms msdel of rite / --^a^ident 

— ^sriable by.fir^ s»ol. t:j,.g the bi ~ .^ingl' rizredicTco? , -"hrm tr^e second 
; edictor is cheese .>/ te. tb ntronv^iT . conti burior the 

P' i^aiicxion of cc > ^3 tn^: effer '^-irof the fir'-^ oredSztor. The 

.cs continue^ ' n at each srep, t^^ne isriable selected f^^r inclusion 

i tne model ir.>--^«!eas<s^ . e predictj^on of T rzrrre than any other predictor. 
Tfis:-selection -zrz — - > rtrps whrn tne reiE lining variables fail to contri- 
b" te significants. ^^ ^h*^ prediction of Y- The backward el. ff ri nation 
procedxire begins v ' ti. a model containing all potential predictors, and 
then at each step a . Die is eliminate .1 lif its removal from the model 
results in the smal -=SEr . eduction in tne -m>d>eVs effectiveness. The 
elimination process cmtf-rues until the rri!ncval of any variable results 
in a significant reduc-ion in the model ''s R"'". The true stepwise 
procedure is a variair" the forward ?i^\iBc:zion technique. It differs 
from the forward selernii procedure i' ": at each step, a variable 
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that has been" previously/ in iuded in the ma.til may be delsted if a partial 
F-test shows zhatr variable t:o B« sn insi-^^nlz^cant predictor. 

In mostc-f the ccam^^uter-- statfstical psz^ages thai: have stepwise 
regression pPf. '^eciures^ the -^riTerion used fcrr variable selection is an 
E^-test fomnfid^ as follows: 



F ^ ^ 5 (1) 

.1/1)/ ^7J^p-l) 



where: R^. = ttpc: coef ficisl or determind-^nan far 

the model conrainijig all predL-ctors 
included at previinis steps, rplis the 
vara,^blr^ ur^der test. 
?^ ~ ^^'^ ccx-iff icaent cj- ietermii--' ' for 
the "model contaJlr-.ng all pr-scfirrnrs 
■^jcei:t he variaible under ter . . 
^ thr nujmber of cdds* ovations, 
p ' thie nuir5iLer of ^^ecictors usee in tzio 
p-Tif-^I trnat prcmiced Rj^. 



As with any sta:-^r;V - al /.►-siat, two kim^a of inrerential er-^ors can 
be macie. A type I error- -ale c if el variable was select f^d, using 

the F ratio criterion, whe • it v-oriable's population regression weight 
was zero. A type II err^r oc::urs vjtisn a variable is not seleimed, using 
the F-test criterion, whe.n tis:: "/rariabla has a non-zero population weight. 

Most users of stepwise ^-^^^r ^ion adopt one of the traditional 
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significance levels (.05 or .01) when evaluating the F-test in (1). 



test. HowevBr, another perspective can be taken when considering the 
type I error rate, the problem-wide error rate. 

The problem-wide error rate is the probability of selecting any 
variable when all variables have population regression weights of zero, 
In other words, the problem-wide error rate is the probability of forming 
a sample regression model, when none should be formed. The rest of this 
paper addresses this error rate, and a procedure will be presented that 
allows researr:hers to control its value. 

The problem-wide error rate is comparable to the family-wide error 
rate commonly encountered in the context of post hoc tests conducted 
after a significant effect has been found in an ANOVA. For example, 
the probability of making one or more type I errors in a family of 
orthogonal tests is: 



This significance level will '^determine the type I error rate for each 




(2) 



where 



ttp = the family -wide error rate. 



k = 



the number of orthogonal tests. 



a. = the significance level on test i. 



V/hen the a . ' s are all equal to a, 



a 



,p = 1 - . 



(3) 



EKLC 



5 



4 



If a researcher wished to control by reducing a^^, (3) could be 
solved for a.^,: 

= 1 -^yr^. (4) 

Alternately, tr^ researcher could conservatively approximate 
using the Bonnferoni inequality, 

^ a^/k (5) 

VThen the members of the family of tests are not orthogonal, formula^ 
(4) and (5) yield conservative values of a^. That is, the use of -a^ 
from (4) or (5) will result in an less than the desired value. The 
solution for a,j, is considerably more complex v/hen the tests are not 
orthogonal. The solution for a critical F that will maintain ap at a 
desired value should be done using the correlated F distributors (Pope 
and Webster, 1972). Unfortunately the integration of the correlated F 
distribution is an extremely tedious process, and only limited tables 
critical values derived from it are available. Consequently, an approxi> 
mate solution was sought using Monte Carlo methods. 

METHOD 

A Mcnte Carlo program, written in FORTRAN IV, was prepared by thf 
author for this project. The program incorporated subroutines supplie 
in the International Mathematical and Statistical Library (1975). Thi 
IMSL subroutines were selected betause of their proven accuracy and 
efficiency. A copv of the prograrr. is supplied in the AppenSx of thir 
paper. 

6 
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The 'Troff'am gene- - sample data malirices (cases by variables) 
sanmled \i Xh a giveB r :jc:pi-i^ion dispersic^ matrix. Subroirrine GGNRM 
we::- ased f:=r this: rrnrr^.r^e. Various population correlaztion. matrices were 
suir^-ied tc i3G2TRi . xl a saimple data matrix of standard ncmnri dieviat^s 
was -roczace:^- h. 1 1 ^-^^^"^ ''ilarion correlations between the prsdiztors sma 
the Gzri-rericz: ^^sr-.-bl-s' were set equ*:! l:> zero. The inr^er -predi' • 
corr^ axions >^f^ve ^~ -equal to a commorr value, and- for th'^ rLxus 
rep_^::^tioiis € ~ • - zl iihis study, the imter-predictor cOr^&2^- icms 
were i:, .3, .5 , . - In addition, ^e numbers of pc2c^^ ' c»r5- used 

were . 3, , and 20. For every combination ox -±^^ -umte? of 

prediirrors aXT>: th i'/^^rage inter-predictor correlation (35 iirr. alD^ a 
thouS332d s3er}-ft!p:lfc n iets were generated. 

3ach dan-u ser ^ ^s generated was then subjected to a^siepwise 
regis^ion ^^r_:7sis .^ring IMSL subroutine RLSTEP. Subrou-^ ne RLSTEP 
uses ^tru^ r^ternwis procedure. Variable selection is governed by a 
sipni:fican-e-: t-stirt process. VHien, at any ^rep, no F-test 

is isignifi;.. art. th. selection process ceases. 

or z'. r puriDoses of this study, an error occured W: en a model, other 
th- he TT^^i model, vjas formed by subroutine RLSTEP. T-e proportion of 
an.. - ss r^vsulting in a model was treated as an empiric:iL-. estimate of 
tte^A ^naiuJ-ity of erroneously forming a model using sti^pwise regression 
ana— ^ - s • 

RESULTS 

Tfe-ie 1 shovrs the results obtained when a variable s^.-ction 
sigr^fic^-ice level of .05 is used. The table entries in Ti^aoLe 1 are the 
proportici- of 1000 stepwise regression analyses that prodLi::-ed a sample 
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model when none should have tieen produced. For example, when a res ^^cher 
has ten potential predictor chat have correlations with each other 
equal to .50, "the probabilit'^- of erroneously foraing a model is ap;rrc;:iL— - 
mately .308. 

Since tlie "ralues in Ta^^ 1 are empirics estimates o:f the actuj. 

probabilities making an ?rro:-, there is sere sampling error. Thr^ 
magnitu?^^-^ of ris= sampling errrror c;n be conservatively estimated by .nf 'ig 
the st^g^a:'. err*or of a prc—rt^rr: when p = .5. Since 1000 replicaticcns 
were :i derive each taoie eH^Ys "the standard error of a sample 

propor^ li ill be less than c equal to .016. Consequently, a cons^^rva- 
tive 6* cc idence interval xo':"- rhe true probability of making an error 
will - : t3^-.V ec value ± .016. 

r,^ figures in Table 1 s irt two conclusions: (1) The probability 
of er~"'~^.eou5^7 forming a regr ^sion model increases dramatically as a 
funct_i . of -he number of prf arors, and (2), as the inter-predictor 
corre^u^ jion increases, the prr^>,r ability of making an error decreases. 
Conssnnntly, any solution tc the error rate problem must take into 
consideration the number of predictors and the inter-predictor correlation. 

After Table 1 was prepared, an attempt was made to develop an 
algorithm that could be used to select a significance level for variable 
selection that would control the problem-wide error rate. 

The rationale for the algorithm presented here was based on the 
formula that gives the family -wide error rate in k independent tests. 
Formula (3) is reproduced here for this purpose: 

ap = 1 - , (6) 
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All terms are defined in (3). If and are known, : can be solved 
for as follows: 

InCl-a ) 

k = ^— (7) 

Formula (7) was applied to each entry in Table 1, ard the resulting 

k values are given in Table 2. In producing Table 2, cl^ was .05 and 

i r 

was taken as tfae corresponding value in Table !• The 1: values in Table 2 
were then plotted as a function of various measures of the inter-predictor 
correlation • Figure 1 shows one of these plots for th^e 10 predictor 
variable case. The k values were observed to be an i^.verse linear 
function of p^, the inter-predictor correlation. The following function 
was considered to be a reasonable approximation: 

k = p - (P-I)P^^ (8) 

where p = the number of predictors 

= the inter-predictor correlation. 

XX 

This function seemed suitable since for the extreme cases cf p^ , 0 and 

XX ' 

1.0, (8) produced k values of p and 1 respectively. When p^^ is equal 
to 0, the problem-wide error rate should equal the ap valua given by (6). 
Under this condition (p^^^ = 0) the error rate is directly analogous to 
the family-wide error rate for a family of orthogonal tests. When p^ • 
is equal to 1, every predictor is linearly dependent on the other 
predictors, hence there is in fact only one predictor. Formula (8) yields 
a k value of 1, when p^,, equals 1. In addition, inspection of plots, such 

XX 
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as Figure 1, sugg.esr.Md that (8) was also accurate for estimating k for 
values of bet^Bssc- .0 and 1. 

XX 

Unfortunatel-j^ researcher using stepwise regression never knows 
p^v, so it must =stiJiiated. A less biased estimate of the squared correla- 
tion coefficient ccan be obtained using the shrinkage formula (McNumar, 
1969); 

The estimate of used for this study was obtained as follows: 

XX 



Let H = the inter-predictor correlation matrix. 

Define each element in R as 

PP 

where r? . = the square of the ijth element of R , and 
ID PP 

N = the number of observations, 
o-l p 

'IE i . 

Let ?2 = ^"^^^ ' (11) 

¥p^-p) 



which is the mean of the off diagonal elements of R . 

srhr 

The sample estimate of p^^ is then substituted into (8) to obtain 
k = p - (p-l)?2 ^ (12) 



After k has been obtained via (12), is obtained. 
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= 1 - ^^^1^ , (13) 

where is the desired problem-wide error rate. 
A concise worked example is given in the Appendix of this paper. 

The validity of the proposed algorithm was then tested by modifying 
the Monte Carlo program, used to produce Table 1, to use (13) to select 
an a^. The results of this validation study are presented in Table 3. 
As can be noted in Table 3, the probability of erroneously forming a 
model, using (13) to determine a^, approaches the desired value of .05. 
There is a slight tendency for this procedure to produce conservative 
values of a^. The average value of in Table 3 is .045, and the 
conservative nature of the procedure is most apparent for problems with 
large numbers of predictors and high inter-predictor correlations. 

DISCUSSION 

The type I error rate in stepwise regression analysis deserves 
serious consideration by researchers. The literature is replete with 
"significant" findings that fail the ultimate test of replication. One 
possible explanation for this state of affairs might lie in the increasing 
problem-wide error rate that can occur in stepwise regression analysis. 

If a researcher considers the problem wide error rate important, he 
or she should take some corrective action. Three possibilities exist, 
depending on the kind of analysis contemplated. They are: (1) Prior to 
the stepwise analysis conduct an omnibus test of the model containing all 
potential predictors, (2) use the backward elimination procedure and use 
an obtained by substituting the number of predictors for k in (13), 
or (3) use the algorithm for obtaining a„ presented here, if a forward 
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selection or true stepwise procedure is used. 
The Omnibus Test 

The analysis begins by forming a full model containing all 
predictors. The for this model is tested for significance at the 
ttp level. The F is obtained as follows: 

F = 1/2 , (lH) 

(1-r2)/(N-p-1) 

where = the coefficient of determination for the model 
containing all potential predictors, 
p = the number of predictors, 
N = the number of cases. 

This F ratio yields a simultaneous test of significance for all weights 
in a model. Proceed with the analysis only if a significant F using (14) 
is obtained. 

T he Backward Elimination Procedure 

The backward elimination procedure is comparable to testing a 
family of orthogonal hypotheses. At each step, the variance accounted 
for in the dependent variable that is tested for each predictor is 
independent of all ther sources of variation. Consequently, the use of 

(15) 



will maintain ap at its desired value. 



Finally, the algorithm developed in this paper is recommended if a 
forward selection or true stepwise procedure is used. Since the value 
of obtained using (13) will be greater than that obtained using (15), 
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when some covariance among the predictors is present, the use of (13) 
will produce a more powerful analysis. 
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Table 1 

Monte Carlo Estimates of the Probability of 
Erroneously Forming a Sample Model Using 
Stepwise Regression Analysis with 
a Variable Selection Significance Level of .05 



Inter-Predictor Number of Predictors 



Correlation 


2 


3 


4 


5 


7 


10 


20 


.0 


.102 


.130 


.184 


.216 


.304 ' 


.410 


.653 


.3 


.101 


.130 


.178 


.213 


.275 


.367 


.552. 


.5 


.097 


.128 


.171 


.196 


.235 


.308 


.417 


.7 


.085 


.125 


.140 


.153 


.185 


.225 


.314 


.9 


.073 


.094 


.101 


.111 


.122 


.126 


.169 
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Table 2 

k Values Derived Using Formula (7) 
on the Values from Table 1 



Inter-Predictor 










Number 


of Predictors 










Correlation 




2 




3 


4 


5 




7 


1 


0 




20 


.0 


2. 


,10 


2, 


,72 


3.96 


4.74 


7. 


06 


10. 


29 


20 


.63 


.3 


2. 


.08 


2. 


.72 


3.82 


4.67 


6. 


27 


8. 


92 


15 


.70 


.5 


1. 


.99 


2, 


,67 


3.66 


4.25 


5. 


22 


7. 


18 


10 


.52 


.7 


1, 


.73 


2, 


.60 


2.94 


3.24 


3. 


98 


4. 


97 


7 


.35 


.9 


1, 


.48 


1, 


.92 


2.08 


2.28 


2. 


54 


2. 


63 


3 


.61 
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Figure I. Plot of k as a function of 
/9i for 10 predictors. 
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Table 3 

Monte Carlo Estimates of the Probability 
of Erroneously Forming a Sample Model Using Stepwise 
Regression Analysis with a Variable Selection Significance 
Level Obtained Using Formula 13. The Desired was .05 



Inter-Predictor Number of Predictors 



Correlation 


2 


3 


4 


5 


7 


10 


20 


.0 


.052 


.044 


.058 


.044 


.048 


.045 


.055 


.3 


.050 


.045 


.044 


.055 


.046 


.047 


.038 


.5 


.060 


.044 


.041 


.063 


.041 


.044 


.042 


.7 


.059 


.050 


.041 


.046 


.037 


.031 


.032 


.9 


.045 


.054 


.056 


.050 


.033 


.027 


.011 
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APPENDIX I 



Copy of the Computer Program Used 
in the Study 
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' APPENDIX XT 
A Worked Example of the Algorithm 
for Obtaining a Significance 

Level for Variable Selection Using 
Stepwise Regression 



R 



PP 



N = 20 



Desired Model Error Rate = .05 



1.0 .3 .5 
1.0 .2 
1.0 



sym 



^. = 1 - ( 1 - rl.) 



N - 1 



N - 2 



/%2 

^12 



= .0391+ 



/v2 



= .2083 



A2 

^23 



= -.0133 



-2 

r 



p-1 p 
i=l 



/s2 
r. . 

1] 



h (p^ 



P) 



k = p - (p - 1) r 



-2 



.0781 



= 2.81138 



T 



.0179 
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